Search CORE

37 research outputs found

Human Associations Help to Detect Conventionalized Multiword Expressions

Author: Gerasimova Anastasia
Loukachevitch Natalia
Publication venue
Publication date: 12/09/2017
Field of study

In this paper we show that if we want to obtain human evidence about conventionalization of some phrases, we should ask native speakers about associations they have to a given phrase and its component words. We have shown that if component words of a phrase have each other as frequent associations, then this phrase can be considered as conventionalized. Another type of conventionalized phrases can be revealed using two factors: low entropy of phrase associations and low intersection of component word and phrase associations. The association experiments were performed for the Russian language

arXiv.org e-Print Archive

Crossref

Cross-domain opinion word extraction model

Author: Chetviorkin Ilia
Loukachevitch Natalia
Publication venue
Publication date: 01/01/2012
Field of study

In this paper we consider a new approach for domain-specific opinion word extraction in Russian. We propose a set of statistical features and algorithm combination that can discriminate opinion words in a particular domain. The extraction model is trained in a movie domain and then applied to four other domains. We evaluate the quality of obtained sentiment lexicons intrinsically. Finally, our method is adapted to a movie domain in English and demonstrates comparable results

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Attention-Based Neural Networks for Sentiment Attitude Extraction using Distant Supervision

Author: Arkhipenko K
Glorot Xavier
Loukachevitch Natalia
Loukachevitch Natalia
Rogers Anna
Rusnachenko Nicolay
Shen Yatian
Xuanjing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/06/2020
Field of study

In the sentiment attitude extraction task, the aim is to identify > -- sentiment relations between entities mentioned in text. In this paper, we provide a study on attention-based context encoders in the sentiment attitude extraction task. For this task, we adapt attentive context encoders of two types: (1) feature-based; (2) self-based. In our study, we utilize the corpus of Russian analytical texts RuSentRel and automatically constructed news collection RuAttitudes for enriching the training set. We consider the problem of attitude extraction as two-class (positive, negative) and three-class (positive, negative, neutral) classification tasks for whole documents. Our experiments with the RuSentRel corpus show that the three-class classification models, which employ the RuAttitudes corpus for training, result in 10% increase and extra 3% by F1, when model architectures include the attention mechanism. We also provide the analysis of attention weight distributions in dependence on the term type.Comment: 10 pages, 9 figures. The preprint of an article published in the proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics (WIMS 2020). The final authenticated publication is available online at https://doi.org/10.1145/3405962.3405985. arXiv admin note: substantial text overlap with arXiv:2006.1160

arXiv.org e-Print Archive

Crossref

An Approach to New Ontologies Development: Main Ideas and Simulation Results

Author: Dobrov Boris
Loukachevitch Natalia
Nevzorova Olga
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2003
Field of study

In the paper we consider the technology of new domain's ontologies development. We discuss main principles of ontology development, automatic methods of terms extraction from the domain texts and types of ontology relations

Bulgarian Digital Mathematics Library at IMI-BAS

RuSentNE-2023: Evaluating Entity-Oriented Sentiment Analysis on Russian News Texts

Author: Golubev Anton
Loukachevitch Natalia
Rusnachenko Nicolay
Publication venue
Publication date: 28/05/2023
Field of study

The paper describes the RuSentNE-2023 evaluation devoted to targeted sentiment analysis in Russian news texts. The task is to predict sentiment towards a named entity in a single sentence. The dataset for RuSentNE-2023 evaluation is based on the Russian news corpus RuSentNE having rich sentiment-related annotation. The corpus is annotated with named entities and sentiments towards these entities, along with related effects and emotional states. The evaluation was organized using the CodaLab competition framework. The main evaluation measure was macro-averaged measure of positive and negative classes. The best results achieved were of 66% Macro F-measure (Positive+Negative classes). We also tested ChatGPT on the test set from our evaluation and found that the zero-shot answers provided by ChatGPT reached 60% of the F-measure, which corresponds to 4th place in the evaluation. ChatGPT also provided detailed explanations of its conclusion. This can be considered as quite high for zero-shot application.Comment: 12 pages, 5 tables, 3 figure

arXiv.org e-Print Archive

TatWordNet: A Linguistic Linked Open Data-Integrated WordNet Resource for Tatar

Author: Galieva Alfiya
Ilvovsky Dmitry
Kirillovich Alexander
Loukachevitch Natalia
Nevzorova Olga
Shaekhov Marat
Publication venue: OASIcs - OpenAccess Series in Informatics. 3rd Conference on Language, Data and Knowledge (LDK 2021)
Publication date: 01/01/2021
Field of study

We present the first release of TatWordNet (http://wordnet.tatar), a wordnet resource for Tatar. TatWordNet has been constructed by the combination of the expand and the merge approaches. The synsets of TatWordNet have been compiled by: (i) the automatic conversion of concepts of TatThes, a socio-political Tatar; (ii) semi-automatic translation of synsets of RuWordNet, a wordnet resource for Russian with the followed manual verification and correction; (iii) manual translation of base RuWordNet synsets; (iv) and manual translation of the all hypernyms of the previously translated RuWordNet synsets. The currents version of TatWordNet contains 18,583 synsets, 36,540 lexical entries and 49,525 senses. The resource has been published to the Linguistic Linked Open Data cloud and interlinked with the Global WordNet Grid

Dagstuhl Research Online Publication Server

RUSSE'2018 : a shared task on word sense induction for the Russian language

Author: Arefyev Nikolay
Leontyev Alexey
Lopukhin Konstantin
Lopukhina Anastasiya
Loukachevitch Natalia
Panchenko Alexander
Ustalov Dmitry
Publication venue: RSUH
Publication date: 01/01/2018
Field of study

The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic languages, such as rich morphology and free word order. The participants were asked to group contexts with a given word in accordance with its senses that were not provided beforehand. For instance, given a word “bank” and a set of contexts with this word, e.g. “bank is a financial institution that accepts deposits” and “river bank is a slope beside a body of water”, a participant was asked to cluster such contexts in the unknown in advance number of clusters corresponding to, in this case, the “company” and the “area” senses of the word “bank”. For the purpose of this evaluation campaign, we developed three new evaluation datasets based on sense inventories that have different sense granularity. The contexts in these datasets were sampled from texts of Wikipedia, the academic corpus of Russian, and an explanatory dictionary of Russian. Overall 18 teams participated in the competition submitting 383 models. Multiple teams managed to substantially outperform competitive state-of-the-art baselines from the previous years based on sense embeddings

arXiv.org e-Print Archive

MAnnheim DOCument Server